The volume of scientific publications in general has been increasing tremendously and rapidly across a wide range of scientific fields and disciplines. Such a rapid and tremendous increase means that scientists have to deal with an increasingly thick layer of transient information and that they have to distill the valuable knowledge from more noises and uncertainties associated with the overwhelming amount of input as a whole in a timely way.
The core knowledge of a scientific field is largely documented in its literature in the form of peer reviewed and non-peer reviewed publications. Peer reviewed publications are considered of higher value than non-peer reviewed ones because the science reported in peer reviewed publication is safeguarded by peer scientists and they are more likely to have met the rigorous and stringent criteria. This description will primarily focus on peer reviewed publications; however, those skilled in the art will appreciate that the method described herein is equally applicable to non-peer reviewed publications and other types of text such as patent applications and technical reports.
A body of scientific literature serves two primary roles in the advancement of science: archival and communicative roles. A well-known conception of the structure of scientific literature in the study of science is that scientific literature consists of two principal components: one is classic and the other is transient. The classic component of scientific literature contains well-documented and well-established knowledge of a scientific field, or collective domain knowledge associated with the underlying scientific community. The classic component forms the backbone of the domain knowledge because it represents the fundamental value of the scientific domain, including its principles, methodologies, and major claims. In contrast, the transient component represents the most recent attachment to the backbone structure. It includes the latest publications of new results and new findings. The nature of such attachment remains transient until new publications have been subject to the selection of the scientific community. Such transient layers are sometime known as research fronts. The selection can lead to one of the outcomes: acceptance, rejection, and indifference, although both the structure of such backbones and these outcomes regarding the research fronts are subject to further change as new evidence becomes available or new theories become predominant. The degree of a selection is often measured in terms of the citations received, i.e. the number of times subsequently published articles make references to the work. The more citations of a work, the greater its perceived impact is on the scientific field and therefore the more value it adds to the development of scientific knowledge.
Systematic reviews, comprehensive surveys, and meta analytical studies are among the most common and effective means used by scientists, scholars, and people with similar needs to maintain their understanding of their fields. These methods share similar goals of identifying significant contributions and potential challenging issues and future research directions. They all rely on scientific literature as a primary source of input and try to clarify the state of the art. On the other hand, they have some inherited shortcomings: time consuming, labor intensive, biased by the view of the few. As a result, such reviews are often separated by an extensive period of time. These reviews and surveys are typically performed by experts. Since experts tend to be specialized in some but not all areas of a field, the coverage can be biased by their own preferences and knowledge.
A new approach to reviewing developments in a scientific field without the bias and time consuming approach of the prior art is desired. In particular, a technique is desired whereby quantitative, as opposed to qualitative, reviews of a scientific field may be generated automatically with high scalability and medium to low cost. The present invention is designed to address these needs in the art.